Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Squeakr: an exact and approximate k-mer counting system.

Identifieur interne : 000804 ( Main/Exploration ); précédent : 000803; suivant : 000805

Squeakr: an exact and approximate k-mer counting system.

Auteurs : Prashant Pandey [États-Unis] ; Michael A. Bender [États-Unis] ; Rob Johnson [États-Unis] ; Rob Patro [États-Unis] ; Bonnie Berger

Source :

RBID : pubmed:29444235

Descripteurs français

English descriptors

Abstract

k-mer-based algorithms have become increasingly popular in the processing of high-throughput sequencing data. These algorithms span the gamut of the analysis pipeline from k-mer counting (e.g. for estimating assembly parameters), to error correction, genome and transcriptome assembly, and even transcript quantification. Yet, these tasks often use very different k-mer representations and data structures. In this article, we show how to build a k-mer-counting and multiset-representation system using the counting quotient filter, a feature-rich approximate membership query data structure. We introduce the k-mer-counting/querying system Squeakr (Simple Quotient filter-based Exact and Approximate Kmer Representation), which is based on the counting quotient filter. This off-the-shelf data structure turns out to be an efficient (approximate or exact) representation for sets or multisets of k-mers.

DOI: 10.1093/bioinformatics/btx636
PubMed: 29444235


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Squeakr: an exact and approximate k-mer counting system.</title>
<author>
<name sortKey="Pandey, Prashant" sort="Pandey, Prashant" uniqKey="Pandey P" first="Prashant" last="Pandey">Prashant Pandey</name>
<affiliation wicri:level="2">
<nlm:affiliation>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790</wicri:regionArea>
<placeName>
<region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Bender, Michael A" sort="Bender, Michael A" uniqKey="Bender M" first="Michael A" last="Bender">Michael A. Bender</name>
<affiliation wicri:level="2">
<nlm:affiliation>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790</wicri:regionArea>
<placeName>
<region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Johnson, Rob" sort="Johnson, Rob" uniqKey="Johnson R" first="Rob" last="Johnson">Rob Johnson</name>
<affiliation wicri:level="2">
<nlm:affiliation>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790</wicri:regionArea>
<placeName>
<region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Patro, Rob" sort="Patro, Rob" uniqKey="Patro R" first="Rob" last="Patro">Rob Patro</name>
<affiliation wicri:level="2">
<nlm:affiliation>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790</wicri:regionArea>
<placeName>
<region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Berger, Bonnie" sort="Berger, Bonnie" uniqKey="Berger B" first="Bonnie" last="Berger">Bonnie Berger</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2018">2018</date>
<idno type="RBID">pubmed:29444235</idno>
<idno type="pmid">29444235</idno>
<idno type="doi">10.1093/bioinformatics/btx636</idno>
<idno type="wicri:Area/PubMed/Corpus">000996</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000996</idno>
<idno type="wicri:Area/PubMed/Curation">000996</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000996</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000763</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000763</idno>
<idno type="wicri:Area/Ncbi/Merge">001D35</idno>
<idno type="wicri:Area/Ncbi/Curation">001D35</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">001D35</idno>
<idno type="wicri:Area/Main/Merge">000807</idno>
<idno type="wicri:Area/Main/Curation">000804</idno>
<idno type="wicri:Area/Main/Exploration">000804</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Squeakr: an exact and approximate k-mer counting system.</title>
<author>
<name sortKey="Pandey, Prashant" sort="Pandey, Prashant" uniqKey="Pandey P" first="Prashant" last="Pandey">Prashant Pandey</name>
<affiliation wicri:level="2">
<nlm:affiliation>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790</wicri:regionArea>
<placeName>
<region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Bender, Michael A" sort="Bender, Michael A" uniqKey="Bender M" first="Michael A" last="Bender">Michael A. Bender</name>
<affiliation wicri:level="2">
<nlm:affiliation>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790</wicri:regionArea>
<placeName>
<region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Johnson, Rob" sort="Johnson, Rob" uniqKey="Johnson R" first="Rob" last="Johnson">Rob Johnson</name>
<affiliation wicri:level="2">
<nlm:affiliation>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790</wicri:regionArea>
<placeName>
<region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Patro, Rob" sort="Patro, Rob" uniqKey="Patro R" first="Rob" last="Patro">Rob Patro</name>
<affiliation wicri:level="2">
<nlm:affiliation>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science, Stony Brook University, Stony Brook, NY 11790</wicri:regionArea>
<placeName>
<region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Berger, Bonnie" sort="Berger, Bonnie" uniqKey="Berger B" first="Bonnie" last="Berger">Bonnie Berger</name>
</author>
</analytic>
<series>
<title level="j">Bioinformatics (Oxford, England)</title>
<idno type="eISSN">1367-4811</idno>
<imprint>
<date when="2018" type="published">2018</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Animals</term>
<term>Eukaryota (genetics)</term>
<term>Gene Expression Profiling (methods)</term>
<term>Genome</term>
<term>Genomics (methods)</term>
<term>High-Throughput Nucleotide Sequencing (methods)</term>
<term>Humans</term>
<term>Sequence Analysis, DNA (methods)</term>
<term>Sequence Analysis, RNA (methods)</term>
<term>Software</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>Algorithmes</term>
<term>Analyse de profil d'expression de gènes ()</term>
<term>Analyse de séquence d'ADN ()</term>
<term>Analyse de séquence d'ARN ()</term>
<term>Animaux</term>
<term>Eucaryotes (génétique)</term>
<term>Génome</term>
<term>Génomique ()</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Séquençage nucléotidique à haut débit ()</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en">
<term>Eukaryota</term>
</keywords>
<keywords scheme="MESH" qualifier="génétique" xml:lang="fr">
<term>Eucaryotes</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Gene Expression Profiling</term>
<term>Genomics</term>
<term>High-Throughput Nucleotide Sequencing</term>
<term>Sequence Analysis, DNA</term>
<term>Sequence Analysis, RNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Animals</term>
<term>Genome</term>
<term>Humans</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Algorithmes</term>
<term>Analyse de profil d'expression de gènes</term>
<term>Analyse de séquence d'ADN</term>
<term>Analyse de séquence d'ARN</term>
<term>Animaux</term>
<term>Génome</term>
<term>Génomique</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Séquençage nucléotidique à haut débit</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">k-mer-based algorithms have become increasingly popular in the processing of high-throughput sequencing data. These algorithms span the gamut of the analysis pipeline from k-mer counting (e.g. for estimating assembly parameters), to error correction, genome and transcriptome assembly, and even transcript quantification. Yet, these tasks often use very different k-mer representations and data structures. In this article, we show how to build a k-mer-counting and multiset-representation system using the counting quotient filter, a feature-rich approximate membership query data structure. We introduce the k-mer-counting/querying system Squeakr (Simple Quotient filter-based Exact and Approximate Kmer Representation), which is based on the counting quotient filter. This off-the-shelf data structure turns out to be an efficient (approximate or exact) representation for sets or multisets of k-mers.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>État de New York</li>
</region>
</list>
<tree>
<noCountry>
<name sortKey="Berger, Bonnie" sort="Berger, Bonnie" uniqKey="Berger B" first="Bonnie" last="Berger">Bonnie Berger</name>
</noCountry>
<country name="États-Unis">
<region name="État de New York">
<name sortKey="Pandey, Prashant" sort="Pandey, Prashant" uniqKey="Pandey P" first="Prashant" last="Pandey">Prashant Pandey</name>
</region>
<name sortKey="Bender, Michael A" sort="Bender, Michael A" uniqKey="Bender M" first="Michael A" last="Bender">Michael A. Bender</name>
<name sortKey="Johnson, Rob" sort="Johnson, Rob" uniqKey="Johnson R" first="Rob" last="Johnson">Rob Johnson</name>
<name sortKey="Patro, Rob" sort="Patro, Rob" uniqKey="Patro R" first="Rob" last="Patro">Rob Patro</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000804 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000804 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     pubmed:29444235
   |texte=   Squeakr: an exact and approximate k-mer counting system.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:29444235" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021